Estimating the total genome length of a metagenomic sample using k-mers

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering metagenomic reads using spaced k-mers

With the emergence of next-generation sequencing technologies, the classification of short reads in a metagenomic sample has become an important yet difficult task. Several tools attempt to tackle this problem with each having a strong point in certain situations. Herein, a novel method is proposed that has its strong point in processing short reads. It is based on two new concepts: utilizing m...

متن کامل

dna2vec: Consistent vector representations of variable-length k-mers

One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet, the distance between any pair of one-hot vectors is equidistant. This is particularly problematic when applying the latest machine learning algorithms to so...

متن کامل

Indexing Arbitrary-Length k-Mers in Sequencing Reads

We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space...

متن کامل

CoMeta: Classification of Metagenomes Using k-mers

Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the...

متن کامل

Higher Classification Accuracy of Short Metagenomic Reads by Discriminative Spaced k-mers

The growing number of metagenomic studies in medicine and environmental sciences is creating new computational demands in the analysis of these very large datasets. We have recently proposed a timeefficient algorithm called Clark that can accurately classify metagenomic sequences against a set of reference genomes. The competitive advantage of Clark depends on the use of discriminative contiguo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: BMC Genomics

سال: 2019

ISSN: 1471-2164

DOI: 10.1186/s12864-019-5467-x